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Goals 
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a)  Use  the  phylogeny  of  16S  ribosomal  RNA  (rRNA)  sequences  to  determine  the 
evolutionary  relationships  among  archaebacteria. 

b)  Insure  that  16S  rRNA  sequence  data  for  the  best-characterized  archae- 
bacterial  species  are  available  to  provide  an  overall  phylogenetic 
framework  within  which  other  species  can  be  placed. 

c)  Maximize  the  reliability  of  the  inferred  phylogenies  and  define  the 
limitations  of  these  phylogenies. 

d)  Actively  work  with  other  investigators  to  produce  16S  rRNA  sequence  data 
from  the  archaebacteria  of  greatest  interest  to  them  so  that  a  unified  body 
of  data  will  accumulate  relating  many  diverse  archaebacteria. 


Accomplishments 


(numbers  in  text  refer  to  the  listing  of 
publications  and  presentations  below) 

Because  of  the  general  applicability  of  the  sequencing,  data  collection, 
and  data  analysis  techniques  which  form  the  core  of  our  ONR  funded  project, 
there  is  substantial  synergism  between  the  ONR  project  and  the  other 
phylogenetic  and  natural  microbial  populations  work  in  the  laboratory.  In 
order  to  fully  reflect  the  progress  on  the  ONR  project,  portions  of  the 
overlapping  work  are  included  below. 

Rapid  rRNA  sequencing  methodology 

The  techniques  for  rapidly  determining  partial  16S  rRNA  sequences  by 
dideoxynucleotide-terminated  sequencing  from  "universal"  16S-rRNA-specif ic 
primers  with  reverse  transcriptase  have  been  refined  to  yield  more  data  per 
molecule  (in  particular  by  decreasing  the  number  of  sequence  positions  of 
ambiguous  identity).  The  current  protocol  will  appear  in  a  forthcoming  volume 
of  Methods  in  Enzymology  devoted  to  cyanobacteria  (10).  In  addition,  we 
are  approximately  three-fourths  through  the  writing  of  a  comprehensive 
"laboratory  manual"  which  explains  all  of  the  procedures  (from  RNA  isolation 
through  phylogenetic  analysis)  at  a  level  appropriate  for  non-molecular 
biologists . 
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Sequencing  efforts  in  the  laboratory  have  focused  on  the  accumulation  of  ■ 

partial  sequences  of  16S  rRNAs  from  members  of  two  eubacterial  groups:  the  ] 

sulfur-oxidizing  bacteria  (in  collaboration  with  Arthur  Harrison,  University 
of  Missouri,  Columbia)  and  the  cyanobacteria  (14,  17).  Also,  we  have  taught 
the  rapid  rRNA  sequencing  technique  to,  and  have  collaborated  with,  Michael 
Ghiselin  (California  Academy  of  Science),  Rudolf  Raff  (Indiana  University), 

Elizabeth  Raff  (Indiana  University),  and  Marilyn  Milberger  (Craig  Nelson 

laboratory,  Indiana  University)  for  the  purpose  of  inferring  the  relationships 

among  the  metazoan  phyla  (5,  11,  15,  20).  Several  additional  investigators 

have  visited  our  laboratory  for  assistance  in  learning  to  do  rapid  rRNA 

sequencing:  Dan  Distel  (Scripps  Institute  of  Oceanography),  to  study  sulfur- 

oxidizing  symbionts  of  clams  (13);  Reinhardt  Rossen  (Institute  for  Great  Lakes  ' 

Research),  to  study  lake  microbiology;  Peggy  Romero  (Howard  Gest  laboratory,  1 

Indiana  University),  to  study  an  unusual  group  of  photosynthetic  bacteria;  j 

Farooq  Azam  and  Michelle  Pontius-Brewer  (both  from  Scripps  Institute  of 

Oceanography),  to  study  ocean  microbiology;  Jed  Fuhrman  (State  University  of  I 

New  York,  Stony  Brook),  to  study  ocean  microbiology;  Colleen  Cavanaugh 

(Harvard  Unversity),  to  study  sulfur-  and  methane-oxidizing  symbionts  of 

marine  invertebrates;  and  Tineke  Burger-Wiersma  (University  of  Amsterdam),  to 

study  the  free-living  prochlorophyte,  Prochlolothrix  hollandica.  We  have 

also  collaborated  with  Paul  Romaniuk  (University  of  Victoria,  British 

Columbia,  Canada)  on  a  phylogenetic  analysis  of  the  genus  Campylobacter  (8). 

One  reason  for  accumulating  16S  rRNA  sequences  from  diverse  organisms  is  to 
provide  a  data  base  for  use  in  phy logenetically  "identifying"  rRNA  genes  which 
are  isolated  directly  from  the  DNA  present  in  the  biomass  of  a  natural 
population  (3,  4,  9,  19).  The  rRNA  gene  characterizations  provide  an  overview 
of  the  component  organisms  in  the  population,  without  requiring  laboratory 
cultivation  of  the  organisms.  We  had  previously  isolated  (as  recombinant 
DNAs )  the  rRNA  genes  from  the  microbial  community  of  a  91“C  hot  spring 
(Octopus  Hot  Spring  in  Yellowstone  National  Park).  Three  rRNA  genes  (two 
eubacterial  and  one  archaebacterial )  from  this  population  have  now  been 
partially  sequenced  and  are  nearly  ready  for  phylogenetic  analysis.  We  have 
also  assisted  David  Ward  (Montana  State  University,  Bozeman),  who  is  using 
these  techniques  to  characterize  the  microbial  mat  communities  associated  with 
Octopus  Hot  Spring. 

Analysis  of  archaebacterial  phylogeny 

Approximately  220  phy logenetically  useful,  partial  or  complete  16S  rRNA 
sequences  from  a  wide  variety  of  organisms  and  organelles  have  been  compiled. 

Analyses  of  the  sequence  relationships  within  the  archaebacteria  and  the 
relationships  of  the  archaebacteria  to  other  organisms  have  been  published 
(1,  2,  16,  18).  The  analyses  render  untenable  the  suggestions  of  Lake  and 
colleagues  (Lake  et  al . ,  1985)  that  the  eubacteria  derive  from  photosynthetic 
archaebacteria  ( halobacteria ) .  Instead,  our  analyses  support  a  view  in  which 
the  archaebacteria  form  a  distinct  ( holophy letic )  group.  (See  below  for  a 
discussion  of  alternative  approaches  to  the  sequence  data  analysis.) 

Availability  of  analysis  programs 

We  have  written  a  detailed  description  of  our  tree  inference  method  for 
a  Methods  in  Enzymology  volume  on  ribosomes  (7).  Our  phylogenetic  tree 
inference  programs  have  been  improved  so  that  it  is  possible  to  examine 
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systematically  the  best  alternatives  to  the  "optimal  tree."  We  are  providing 
copies  of  our  sequence  analysis  and  tree  inference  programs  (which  are 
dependent  upon  the  VAX/VMS  computing  environment)  to  several  institutions: 
the  University  of  Illinois,  Urbana;  Montana  State  University,  Bozeman;  the 
University  of  Victoria;  National  Jewish  Center  for  Immunology  and  Respiratory 
Medicine,  Denver;  the  State  University  of  New  York,  Stony  Brook;  the 
Dana-Farber  Cancer  Research  Center,  Boston;  and  Kings  College,  London. 


Availability  of  16S  rRNA  sequence  data 


The  published  16S  rRNA  sequences  in  our  data  collection  are  available 
either  individually  or  in  aligned  form.  We  can  provide  them  in  a  printed 
copy,  on  nine-track  tape,  by  dial-up  connection  to  our  MicroVAX  (running  the 
VMS  operating  system),  or  by  BITNET  electronic  mail.  The  format  of  the 
aligned  sequences  is  a  text  file,  arranged  similarly  to  a  published  sequence 
alignment.  The  nucleotides  are  supplied  in  IUB  recommended  representation, 
and  the  alignment  gaps  are  represented  by  hyphens.  Minor  changes  of  format 
could  be  made  to  accomodate  other  needs. 


Potential  systematic  errors  in  phylogenetic  tree  inference 


We  have  investigated  the  potential  for  systematic  errors  in  the  phylogenies 
inferred  from  rRNA  sequences  resulting  from  disparate  rates  of  mutation 
acceptance  (different  average  "molecular  clock*  rates).  The  potential  for 
error  in  the  inferred  phylogenetic  trees  can  be  substantially  decreased  by 
utilizing  a  more  realistic  model  of  sequence  evolution  which  acknowledges  that 
all  sites  in  the  16S  rRNA  are  not  equally  mutable  (18).  Specifically,  we  have 
examined  the  effect  of  assuming  that  the  relative  substitution  rates  across 
the  16S  rRNA  fits  a  log-normal  distribution  function.  For  the  approximately 
950  positions  that  we  routinely  analyze,  the  empirically  determined  width  of 
the  distribution  is  such  that  95%  of  the  sequence  positions  change  at  rates 
between  1/8' th  and  8  times  the  median  rate  of  change.  Although  initial 
studies  were  based  upon  the  relationships  of  mitochondrial  rRNAs,  the 
observations  have  proven  to  very  general.  Other  groups  in  which  lineage-to- 
lineage  differences  in  the  rate  of  fixed  mutation  accumulation  potentially 
influence  the  accurate  inference  of  phylogenetic  relationships  include 
archaebacteria  ( archaebacterial  rRNAs  have  evolved  more  slowly  than  the  rRNAs 
of  one  or  both  other  kingdoms,  and  among  the  archaebacteria  there  are  also 
substantial  variations)  (1),  echinoderms  (5,  11),  major  eubacterial  divisions, 
and  chloroplasts  in  relation  to  cyanobacteria  (14,  17).  When  we  apply  this 
alternative  data  treatment  to  the  investigation  of  the  relationships  of 
archaebacteria  with  eukaryotes  and  eubacteria,  we  arrive  at  the  same  answer  as 
we  have  in  the  past:  the  archaebacteria  are  distinct  from  these  two  other 
groups . 


Comparing  the  sensitivity  of  various  tree 


inference  methods  to  statistical  error 


There  has  been  significant  controversy  regarding  the  "correct"  method  of 
analysis  of  sequence  data.  We  have  taken  initial  steps  toward  a  quantitative 
analysis  of  the  various  techniques.  We  originally  chose  a  distance  matrix 
method  of  phylogenetic  tree  inference  because  Schwartz  and  Dayhoff  (1978)  had 
presented  evidence  that  it  is  statistically  superior  to  parsimony-type 
analyses,  and  Felsenstein  (1978)  had  demonstrated  a  significant  source  of 


systematic  error  intrinsic  to  parsimony-based  analyses.  Because  there  appear 
to  be  few  citations  of  the  Schwartz  and  Dayhoff  conclusion,  we  have  performed 
similar,  but  more  exhaustive,  simulations  of  phylogeny  reconstruction  by 
parsimony,  a  distance  method,  and,  also,  cluster  analysis.  These  studies  have 
led  us  to  essentially  the  same  conclusions  as  Schwartz  and  Dayhoff,  although 
the  magnitude  of  the  superiority  of  the  distance  method  is  much  less  for 
nucleotide  sequences  than  for  the  amino  acid  sequence  data  considered  by  the 
previous  studies. 


Alternative  analysis  methods  applied  to 
archaebacterial  phylogeny 


Wolters  et  al.  (1986)  have  argued  that  a  proper  cladistic  analysis  of  16S 
rRNA  sequences  reveals  a  specific  relationship  between  eukaryotes  and 
thermophilic  (sulfur-dependent)  archaebacteria ,  to  the  exclusion  of  the 
eubacteria,  methanogens  and  halophilic  archaebacteria.  By  restricting  their 
analysis  to  slowly  varying  sequence  positions  (conveniently  identified  by 
their  lack  of  variation  within  well-estabished  groups,  i.e.  positions  that  are 
conserved  among  eukaryotic  rRNAs  and  conserved  among  eubacterial  rRNAs)  they 
limited  the  analysis  to  data  for  which  parsimony-based  methods  should  be 
appropriate.  Wolters  et  al .  note  three  16S  rRNA  sequence  positions  (1303, 

1334  and  1408  in  the  Escherichia  coli  sequence)  at  which  eukaryotic  and 
thermophilic  archaebacterial  rRNAs  share  a  common  nucleotide,  while  the  other 
sequences  share  a  different  nucleotide.  Thus,  these  three  positions  are  most 
parsimoniously  explained  by  a  specific  eukaryote/thermophilic-archaebacteria 
relationship,  a  view  previously  expressed  by  Lake  et  al .  (1984).  However,  it 
is  not  clear  how  these  authors  can  ignore  the  analogous  nucleotide  usage 
patterns  at  positions  338,  367,  393,  923,  973,  1211,  and  1393  (E.  coli 
position  numbers)  which  all  specifically  relate  the  eukaryotes  to  the 
eubacteria,  and  relate  all  the  archaebacteria  to  one  another.  Thus,  the 
balance  of  the  evidence  in  a  parsimony  analysis  of  slowly  changing  16S  rRNA 
sequence  positions  is  that  the  archaebacteria  are  a  group.  If  one  similarly 
considers  the  proposal  of  Lake  et  al .  (1985)  that  the  eubacteria  are 
specifically  related  to  the  halophilic  archaebacteria,  then  there  are 
additional  slowly  changing  positions  in  conflict:  33,  332,  551,  939,  1074, 
1083,  and  1344.  Thus,  when  used  with  the  most  slowly  varying  sequence 
positions  in  the  molecule,  those  at  which  the  method  should  be  most  reliable, 
a  parsimony  analysis  gives  the  same  interkingdom  relationships  as  the  distance 
matrix  analysis. 

Lake  (1987)  has  argued  that  none  of  the  above  analysis  methods  adequately 
account  for  events  in  the  peripheral  branches  of  a  phylogenetic  tree  that  can 
mimic  the  early  events  which  defined  that  actual  branching  order,  and  he  has 
proposed  an  analysis  technique,  "evolutionary  parsimony,"  that  is  intended  to 
rectify  the  potential  problem.  In  particular,  the  technique  seeks  to 
statistically  eliminate  any  tendancy  for  combinations  of  transversion  type 
mutations  in  peripheral  tree  branches  to  be  confused  with  transversions  in  the 
central  branch  of  a  four  organism,  unrooted  phylogenetic  tree.  The  inference 
of  the  correct  branching  order  is  then  an  issue  of  how  many  sequence  positions 
have  undergone  a  single  transversion  (actually  any  odd  number  would  do) 
mutation  in  the  central  branch  of  the  tree  and  no  changes  at  the  same 
positions  in  any  of  the  peripheral  branches.  Rapidly  changing  positions  will 
not  contribute  useful  information  since  they  will  almost  certainly  have 
undergone  one  or  more  changes  in  the  peripheral  branches  (which,  in  a 
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multikingdom  phylogeny,  span  billions  of  years).  When  we  apply  Lake's 
analysis  technique  to  the  most  slowly  changing  sequence  positions  (those  that 
display  no  intragroup  variation  within  the  eukaryotes  or  within  the 
eubacteria)  we  arrive  at  the  same  conclusions  as  we  do  with  the  distance 
matrix  analyses:  the  archaebacteria  are  a  united  group,  distinct  from  the 
eubacteria  and  eukaryotes. 


Plans  for  the  Next  Year 


a)  Complete  the  "lab  manual"  for  the  rapid  sequencing  of  rRNAs. 

b)  Expand  the  manual  to  include  the  isolation  and  sequencing  of  16S  rRNA  genes 
from  samples  of  natural  populations. 

c)  Effort  will  be  made  to  provide  our  compilation  and  alignment  of  16S  rRNA 
sequence  data  in  additional  data  formats  (unfortunately,  standardization 
of  formats  is  poor  for  alignments  of  multiple  sequences). 

d)  Survey  the  authors  of  previously  published  16S  rRNA  sequences  for  published 
and  unpublished  revisions  to  the  sequences.  There  has  been  a  tendancy  for 
such  revisions  to  appear  (when  they  appear  at  all)  in  contexts  where  they 
can  easily  be  overlooked. 

e)  Cooperate  with  the  University  of  Wisconsin  Genetics  Computer  Group  to 
integrate  phylogenetic  tree  inference  programs  into  the  package  of  programs 
which  they  distribute.  As  an  initial  step,  we  have  agreed  to  assist  them 
in  implementing  our  programs  on  the  University  of  Wisconsin  campus. 

f)  Transfer  our  phylogenetic  tree  inference  programs  to  a  supercomputer  (most 
likely  the  Cray  XMP  at  the  University  of  Illinois  National  Center  for 
Supercomputing  Applications). 

g)  Continue  development  of  phylogenetic  tree  inference  methods  which  are  less 
sensitive  to  known  sources  of  systematic  and  random  error.  Particular 
attention  is  being  given  to  dealing  with  site-to-site  and  lineage-to- 
lineage  variations  in  the  mutation  accepance  rate. 
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